An Active Learning Method for Speaker Identity Annotation in Audio Recordings

نویسندگان

  • Pierre-Alexandre Broux
  • David Doukhan
  • Simon Petit-Renaud
  • Sylvain Meignier
  • Jean Carrive
چکیده

Given that manual annotation of speech is an expensive and long process, we attempt in this paper to assist an annotator to perform a speaker diarization. This assistance takes place in an annotation background for a large amount of archives. We propose a method which decreases the intervention number of a human. This method corrects a diarization by taking into account the human interventions. The experiment is done using French broadcast TV shows drawn from ANR-REPERE evaluation campaign. Our method is mainly evaluated in terms of KSR (Keystroke Saving Rate), and we reduce the number of actions needed to correct a speaker diarization output by 6.8% in absolute value.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the groundtruth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, vi...

متن کامل

The Ta2 Database - a Multi-modal Database from Home Entertainment

This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...

متن کامل

A Multi-Modal Database from Home Entertainment

This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...

متن کامل

Variable print quality

In the literature, much research work has been done in the area of speaker verification. The developments include: different types of speaker verification techniques, methods for feature extraction, measures for telephone channel compensation, system robustness etc. In contrast, the problem of acoustic feature selection for speaker verification has been relatively neglected. Hence our aim is to...

متن کامل

The SRI Speech-Based Collaborative Learning Corpus

We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-canc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016